EN FR
EN FR


Section: New Results

Scene and camera reconstruction

Participants : Marie-Odile Berger, Srikrishna Bhat, Nicolas Noury, Gilles Simon, Frédéric Sur.

Image point correspondences and repeated patterns

Matching or tracking interest points between several views is one of the keystones of many computer vision applications, especially when considering structure and motion estimation. The procedure generally consists in several independent steps: interest point extraction, then interest point matching by keeping only the “best correspondences” with respect to the similarity between some local descriptors, and final correspondence pruning to keep those that are consistent with a realistic camera motion (here, consistent with epipolar constraints or homography transformation.) Each step in itself is a delicate task which may endanger the whole process. In particular, repeated patterns give rise to lots of false correspondences in descriptor-based matching. Actual correspondences are thus hardly, if ever, recovered by the final pruning step. Dealing with repeated patterns is of crucial importance in man-made environments. Starting from a statistical model by Moisan and Stival  [25] , we have proposed a one-stage approach for matching interest points based on simultaneous descriptor similarity and geometric constraint. The resulting algorithm has adaptive matching thresholds and is able to pick up point correspondences beyond the nearest neighbour. We have also shown how to improve Asift   [26] , an effective point matching algorithm to make it more robust to the presence of repeated patterns [5] , [23] , [8] .

Visual words for pose computation

Visual vocabularies are standard tools in the object/ image classification literature, and are emerging as a new tool for building point correspondences for pose estimation. Within S. Bhat's PhD thesis, we have proposed several methods for visual word construction dedicated to point matching, with structure from motion and pose estimation applications in view. The three dimensional geometry of a scene is first extracted with bundle adjustment techniques based on keypoint correspondences. These correspondences are obtained by grouping the set of all SIFT descriptors from the training images into visual words using transitive closure (TC) techniques. We obtain a more accurate 3D geometry than with classical image-to-image point matching. In a second on-line step, these visual words serve as 3D point descriptors that are robust to viewpoint change, and are used for building 2D-3D correspondences on-line during application, yielding the pose of the camera by solving the PnP problem. Several visual word formation techniques have been compared with respect to robustness to viewpoint change between the learning and the test images. Our experiments showed that the adaptive TC visual words are better in many ways when compared to other classical techniques such as K-means [12] .

Tracking by synthesis using point features and pyramidal blurring

Tracking-by-synthesis is a promising method for markerless vision-based camera tracking, particularly suitable for Augmented Reality applications. In particular, it is drift-free, viewpoint invariant and easy-to-combine with physical sensors such as GPS and inertial sensors. While edge features have been used succesfully within the tracking-by-synthesis framework, point features have, to our knowledge, still never been used. This is probably due to the fact that real-time corner detectors are weakly repeatable between a camera image and a rendered texture.

We compared the repeatability of commonly used FAST, Harris and SURF interest point detectors across view synthesis [17] . We showed that adding depth blur to the rendered texture can drastically improve the repeatability of FAST and Harris corner detectors (up to 100% in our experiments), which can be very helpful, e.g., to make tracking-by-synthesis running on mobile phones. We proposed a method for simulating depth blur on the rendered images using a pre-calibrated depth response curve. In order to fulfil the performance requirements, a pyramidal approach was used based on the well-known MIP mapping technique. We also proposed an original method for calibrating the depth response curve, which is suitable for any kind of focus lenses and comes for free in terms of programming effort, once the tracking-by-synthesis algorithm has been implemented.

Acquisition of 3D calibrated data

Christel Leonet joined the team in october 2010 as an INRIA assistant engineer with the aim to build an integrated 3D acquisition system. More specifically, the objective of her work is to combine an IMU (Inertial Measurement Unit), a GPS receiver, a laser rangefinder and a video camera for ground truth data acquisitions of camera movements and scene structures. These data will be useful to validate several algorithms developped in our team. This year she dealt with the hand-eye coordination between the different devices. Moreover, a 3D laser pointer has being built, which allows to acquire textured 3D polygons by pointing them with the laser attached to the camera and the IMU put on a tripod.